On noise masking for automatic missing data speech recognition: A survey and discussion

نویسندگان

  • Christophe Cerisara
  • Sébastien Demange
  • Jean Paul Haton
چکیده

Automatic speech recognition (ASR) has reached very high levels of performance in controlled situations. However, the performance degrades significantly when environmental noise occurs during the recognition process. Nowadays, the major challenge is to reach a good robustness to adverse conditions, so that automatic speech recognizers can be used in real situations. Missing data theory is a very attractive and promising approach. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. While speech recognition with missing data can be handled efficiently by methods such as data imputation or marginalization, accurately identifying missing parts (also called masks) remains a very challenging task. This paper reviews the main approaches that have been proposed to address this problem. The objective of this study is to identify the mask estimation methods that have been proposed so far, and to open this domain up to other related research, which could be adapted to overcome this difficult challenge. In order to restrict the range of methods, only the techniques using a single microphone are considered. 2006 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting confidence measures for missing data speech recognition

Automatic speech recognition in highly non-stationary noise, for instance with a competing speaker or background music, is an extremely challenging and still unsolved problem. Missing data recognition is a robust approach that is well adapted to this kind of noise. A standard missing data technique consists in marginalizing out, from the observation likelihoods computed during decoding, the con...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Asr-driven Binary Mask Estimation for Robust Automatic Speech Recognition

Additive noise has long been an issue for robust automatic speech recognition (ASR) systems. One approach to noise robustness is the removal of noise information through segregation by binary time-frequency masks; each time-frequency unit in a spectro-temporal representation of the speech signal is labeled either noise-dominant or signal-dominant. The noise-dominant units are masked and their e...

متن کامل

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortionmodels that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by en...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2007